[bugfix] fix deepseek rope sincoscache re-generation#2744
[bugfix] fix deepseek rope sincoscache re-generation#2744MengqingCao merged 1 commit intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request correctly addresses a bug in the DeepSeek RoPE implementation where the sincoscache was being regenerated unnecessarily for sequences longer than the initial max_position_embeddings. The fix involves pre-allocating a larger cache during initialization by incorporating the scaling_factor, and removing the dynamic resizing logic from the forward pass. These changes are consistently applied across both the standard and torchair implementations, and the corresponding obsolete tests are correctly removed. My review identifies a minor but important point for robustness: the calculation of max_seq_len results in a float, which could lead to implicit behavior dependencies. I've suggested using math.ceil to ensure it's an integer, which improves clarity and aligns with the expectations of parent classes.
There was a problem hiding this comment.
The scaling_factor is a float, which results in self.max_seq_len being a float. While torch.arange can handle a float end value, it's safer and clearer to use an integer for a value representing a sequence length. This avoids reliance on the implicit behavior of torch.arange with floats and improves code robustness. The parent class RotaryEmbedding also expects an int for the sequence length in its _set_cos_sin_cache method signature. Using math.ceil will ensure the allocated cache is large enough and the length is an integer.
| self.max_seq_len = max_position_embeddings * scaling_factor | |
| self.max_seq_len = math.ceil(max_position_embeddings * scaling_factor) |
There was a problem hiding this comment.
The scaling_factor is a float, making self.max_seq_len a float. It is better practice to use an integer for sequence lengths to avoid relying on the implicit behavior of torch.arange with float inputs and to make the code more explicit and robust. The corresponding method in the parent RotaryEmbedding class also expects an integer. Using math.ceil ensures the length is an integer and the cache size is sufficient.
| self.max_seq_len = max_position_embeddings * scaling_factor | |
| self.max_seq_len = math.ceil(max_position_embeddings * scaling_factor) |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2744 +/- ##
==========================================
- Coverage 73.71% 72.78% -0.94%
==========================================
Files 152 154 +2
Lines 21967 21313 -654
==========================================
- Hits 16194 15513 -681
- Misses 5773 5800 +27
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
this is a cherry-pick from #1551 |
f1cdcde to
6b4f8f0
Compare
Signed-off-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: 1Fire4 <wangdingyi2@huawei.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: offline0806 <z00858301@china.huawei.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com> Signed-off-by: nsdie <yeyifan@huawei.com>
### What this PR does / why we need it? The current implementation will result in duplicate generation of `sin_cos_cache` in rope when `kv_seqlen` > 4k, because the initialization length of the `sin_cos_cache` is only 4k. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? After this PR merged, sin_cos_cache will not increase in forward func, so `test_native_rope_deepseek_forward_cache_handling` is not necessary. - vLLM version: v0.10.1.1 - vLLM main: vllm-project/vllm@60f0843 Signed-off-by: zzzzwwjj <1183291235@qq.com>
What this PR does / why we need it?
The current implementation will result in duplicate generation of
sin_cos_cachein rope whenkv_seqlen> 4k, because the initialization length of thesin_cos_cacheis only 4k.Does this PR introduce any user-facing change?
No.
How was this patch tested?
After this PR merged, sin_cos_cache will not increase in forward func, so
test_native_rope_deepseek_forward_cache_handlingis not necessary.